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Abstract 

We present the software library marathon , which is designed to support the anal¬ 
ysis of sampling algorithms that are based on the Markov-Chain Monte Carlo 
principle. The main application of this library is the computation of proper¬ 
ties of so-called state graphs, which represent the structure of Markov chains. 
We demonstrate applications and the usefulness of marathon by investigating 
the quality of several bounding methods on four well-known Markov chains for 
sampling perfect matchings and bipartite graphs. In a set of experiments, we 
compute the total mixing time and several of its bounds for a large number of 
input instances. We find that the upper bound gained by the famous canonical 
path method is often several magnitudes larger than the total mixing time and 
deteriorates with growing input size. In contrast, the spectral bound is found 
to be a precise approximation of the total mixing time. 


Introduction 

The task of random sampling is to return a randomly selected object from a typ¬ 
ically large set of objects according to a specified probability distribution. Such 
tasks often arise in practical applications like network analysis, where properties 
of a certain network of interest are to be compared with those of a random null 
model network m- Another application is approximate counting of combinato¬ 
rial objects. While this is typically a hard problem, the number of solutions of a 
self-reducible problem, however, can be approximated in polynomial time using 
randomly sampled objects [3j. We consider sampling methods which follow the 
so-called Markov-Chain Monte Carlo (MCMC) approach. A MCMC algorithm 
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can be considered as a random walk on a directed graph where the vertices of 
this graph correspond to the set of objects to be sampled from. Starting at 
an arbitrary vertex, we go to a randomly chosen adjacent vertex and continue 
our random walk from this vertex; after some time, we stop the random walk. 
The object representing the final vertex is returned as a random sample. An 
overview of the surprisingly versatile applications of the MCMC approach was 
given by Diaconis |3j. 

Motivation In general, an infinite number of steps will lead to a truly random 
sample. But what happens if we stop the random walk after a finite number of 
steps? The number of steps which are necessary to sample from a probability 
distribution which is close to the desired distribution is known as the total 
mixing time of a Markov chain, and is of central interest for the applicability of 
an MCMC algorithm. There are several methods which are able to gain upper 
bounds on the total mixing time like the canonical path method. Sinclair [5] 
gives an excellent overview about different bounding techniques. Sometimes, 
such bounding methods can successfully be applied to Markov Chains to gain 
upper bounds on their total mixing times. When applied successfully, such 
upper bounds most often are high-degree polynomials on the size of the input. 
This is far too large to be applicable in practice, where more than a linear 
number of steps is infeasible for large input size. In fact, software tools like 
mfinder [I], which can be used for motif search in large networks, use a MCMC 
sampling approach with a linear number of steps as default for generation of 
null models. This leads to the problem that the sampling result might not be 
as random as desired, resulting in a non-optimal or even incorrect behaviour of 
the application. 

On the other hand, it may be the case that upper bounds on the total 
mixing time are just too pessimistic. From a purely theoretical perspective 
it is often already a breakthrough to establish that a given Markov chain is 
rapidly mixing, that is, to establish a polynomial mixing time bound. Since 
the bounding techniques in Markov chain analysis are often fairly general and 
worst-case instances in terms of total mixing time are not known, it is not clear, 
whether the upper bounds gained by such methods are tight for some worst- 
case instance. However, for practical applicability, it is of eminent importance 
to find as sharp bounds as possible. Up to now, there is very little knowledge 
about the gap between the so established upper bounds and tight bounds for 
actual worst-case instances. Probably most researches in this field will suspect 
that a considerable real gap exists, but for specific Markov chains it is in general 
completely open how many orders of magnitude this gap may be large. By this 
reason, we believe that the true total mixing time might be much lower than 
proven by theoretical methods. Therefore, our working questions are: a) Is 
the total mixing time really as large as the bounding methods propose, or is 
it possible that the bounding methods are just not precise enough to capture 
the real total mixing time? b) Which bounding method has the best potential 
and could lead to better results when further information about the structure 
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of state graphs is given? 


Contribution In trying to answer these questions, we developed the C++ 
library marathon, which has been designed to compute several structural prop¬ 
erties of Markov chains and its corresponding state graphs. The library offers 
the following features: 

• Construction of state graphs from arbitrary input instances for a set of 
user-definable transition rules. 

• Computation of structural properties of state graphs. 

— Computation of the total mixing time for arbitrary e. 

— Computation of the congestion bound gained by the canonical path 
method with an arbitrary, user-definable, path construction scheme. 

— Computation of the upper and lower spectral bound. 

— Computation of network properties like diameter and average path 
length. 

We built this library to easily add algorithms for network analysis. We expect 
that our library could be helpful both for theoretical scientists as well as for 
practitioners, who implement MCMC algorithms and have to choose an appro¬ 
priate number of steps. 

As a demonstration on what kind of research can be done with the help of 
marathon, we analysed structural properties and investigated the quality of the 
total mixing time bounds on the example of four well-known Markov chains. 
From our experiments, we gained several insights. 

• We found that the congestion bound is multiple times larger than the 
corresponding total mixing time for almost all instances. In addition, 
the quality of the congestion bound deteriorates with growing input size. 
This indicates that the congestion bounds are bad approximations of the 
total mixing time, and, since this bound is often used for theoretical anal¬ 
ysis, that the high-degree polynomial bounds from theory may be too 
pessimistic. 

• In contrast, the upper spectral bound is close to the total mixing time for 
all observations. Even if its quality also deteriorates with growing instance 
size, the spectral bounds keeps close to the total mixing time much longer 
than the congestion bound. 

• The lower spectral bound can be used as a very good approximation for all 
input instances we investigated in this work. The data suggests an almost 
linear relationship between the lower spectral bound and total mixing 
time. 
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• We figured out several new structural insights about the Markov chains we 
investigated. For example, we found that the the second largest eigenvalue 
in magnitude is almost always positive. The total mixing time of instances 
of the same input size can be very diverse and depends on the vertex degree 
in the state graphs. This shows that the experimental approach provided 
by our library may lead to structural insights which may be useful to get 
new intuitions for developing more precise bounding techniques. 

From our experiments we can conclude that the canonical path method will not 
lead to tight bounds of the total mixing time, even when further information 
about a state graphs structure is included. Instead, developing new methods 
based on the spectral gap seems to be more promising. 

Structure The remainder of this article is structured as follows: In the next 
section we give a brief introduction into the theory of Markov chains, in particu¬ 
lar, in the concepts of total mixing time and its bounds. Thereafter, we present 
the main features of the marathon library. In the second part of this article, we 
demonstrate the applicability of the library in a set of experiments, while trying 
to answer the questions described above. The results of these experiments are 
shown and discussed in the final section. 


Methods and Materials 

An important tool for understanding MCMC based sampling processes are 
Markov chains. To make the following sections more understandable, we will 
briefly introduce the most important concepts of the theory of Markov chains. 
For a less steep introduction into the topic, we recommend the lecture notes 
of Sinclair [6], the textbook by Levin, Peres and Wilrner 0 , and the survey of 
Lovasz [8]. 

Theory of Markov Chains 

A Markov chain can be seen as a random walk on a set of combinatorial 
objects, the so-called states. Two states x,y £ Q can be connected via a 
transition arc ( x,y ) £ , when x can be transformed into y via small local 

changes. This definition induces a so-called state graph T = (f2, T), repre¬ 
senting the objects and their adjacencies. We define for each pair of states 
x, y £ II the so-called proposal probability n: Q x —>• [0,1] and a weight func¬ 
tion w: ft — > R + . A step from x to y in a random walk is done with transition 
probability P(x,y) = n(x,y)-mm(l,w(y)/w(x)). The matrix P = (P(x,y) XtV& ci) 
is called transition matrix. 

Algorithm [T| shows the classical random walk based sampling method. A 
random state b £ 17 is returned, sampled from a probability distribution pi! 1 
which describes a state’s probability after t steps when starting at state a £ f2. 
The fundamental theorem for Markov chains (see for example 13 ) says that the 
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distribution p$ converges for t — > oo to the unique stationary distribution 7 r 
with 7 r(y) = ^ ^ ilfo) f° r a ll V 6 ^ and initial states a G fl, if the state graph T 
is non-bipartite, connected and reversible with respect to 7r, i.e. 7 t(x)P(x, y) = 
ir(y)P(y,x) for each pair of states x,y £ ft. Markov chains which fulfil these 
properties are called ergodic. 


Algorithm 1 Random Walk 
Input: State a G 12, number t of steps. 

Output: State b £ Cl, according to probability distribution pa 1 . 

1: x t— a 

2: for i i — 1 to t do 

3: Neighbour selection: Pick a neighbour y of x with probability n(x,y). 

4: Metropolis rule: x t— y with probability min ^1, ■ 

5: end for 
6: return x 


Total Mixing Time The total variation distance measures the distance be¬ 
tween two probability distributions p and rj: 

\\»-v\\ ■= \ (i) 

We define r a (e) := min{t € N: || pi^ — 7r|| < e} as the minimal number of steps 
a random walk has to take to reach a distribution which is close to e to its 
stationary distribution when starting at state a G ft. The total mixing time 
r(e) of a Markov chain is defined as 

r(e) := max{r Q (e)}. (2) 

A Markov chain is known as rapidly mixing , if r(e) can be bounded from above 
by a polynomial which depends on the input size n and the parameter e _1 . The 
total mixing time can be bounded by several techniques. We briefly present two 
widely used bounding methods which we investigate in the remainder of this 
article. 

Spectral Bound Let 1 = Ax > A 2 > ... > A|q| > —1 be the real eigenvalues 
of the transition matrix P and let A max be defined as A max = max{|A 2 |, |Aq|}. 
The total mixing time can be bounded by the lower and upper spectral bound [5], 
i.e., 


r(e) < (1 — A max ) 1 ■ (ln(e J ) + MttJJ) 

r(e) > ^A max (l — A max ) -1 ln(2e) _1 , 
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( 3 ) 

( 4 ) 








where 7 : m i n denotes the smallest component of n. The transition matrix P is 
not necessarily symmetric. However, it can be transformed into a symmetrical 
matrix: 

P S ym = D~ ll 2 -P-D 1 / 2 , (5) 

where D is the |f2| x |fl| diagonal matrix with the components of 7r on its main 
diagonal. This transformation relies on the reversibility of a Markov chain. It 
is a classical result that P sy m has the same eigenvalues as P. 


Congestion Bound Sinclair’s multi commodity flow method [5] is often used 
for bounding the total mixing time. Let V = (J,, '^ >x v be a family of simple 
paths in T, each V xy consisting of simple paths between x and y £ O. The 
maximum load congestion p , with respect to V. is then defined as 


P{V) 


max 


7 t(u)P(u 7 v) 


( 6 ) 


For any system of paths V the total mixing time of a reversible Markov chain 
can be bounded by the congestion bound , i.e., 


t(c) < p{V)-(\n{e 1 ) + \n(TrJ n )) . (7) 


The quality of the congestion bounds depend on the path construction scheme 
V. The congestion bound is often used to gain theoretical bounds on the total 
mixing time. 


The marathon library 

In this section, we introduce the main features of the marathon library. Our 

and freely available at 
To install, just follow the instruc¬ 
tion manual at github. Several example programs are available. 

The marathon library is designed for the study of structural properties of 
Markov chains, respectively their corresponding state graphs. One of its current 
main applications is the computation of the total mixing time and several of its 
bounds. The suggested way to use our library is to implement the transition 
rules of a Markov chain and conduct some experiments to quickly learn some 
properties which are typically hard to find in theory. In particular, we allow the 
computation of the eigenvalue A max and the application of the canonical path 
method to compute the congestion bound with some path congestion scheme. 
This way, one can quickly evaluate whether a scheme captures the total mixing 
time closely or if there is a noticeable gap. The marathon library has been 
designed with two main goals in mind: Performance and Extensibility. Aiming 
for the first goal, we use the C++ programming language, so that various highly 
efficient third party libraries become available. In particular, we use CUDA [9] 
and OpenMP [10] to accelerate compute-intense tasks with the use of multi¬ 
core processors and highly parallel graphic processing units. To achieve the 


source code is published under MIT licence 
https://github.com/srechner/marathon. 
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second goal, we designed our library for easy extensibility. Our representation 
of the state graphs is versatile enough to allow the integration of arbitrary graph 
algorithms. Moreover, we provide interfaces to simplify the programming effort 
and to hide complexity from the user. For example, adding a new Markov chain 
is done in a few steps: 

1. Implement a data structure which represents a state, here simply called 
State. 

2. Create a class which implements the MarkovChain interface, using State 
as a template parameter. The following virtual methods have to be over¬ 
ridden: 

• The method MarkovChain: : arbitraryState, constructs an arbi¬ 
trary state from a given input string (e.g. construct a bipartite graph 
from a given bipartite degree sequence). 

• The method MarkovChain: neighbours, takes an object u of type 
State and creates a list of adjacent states N u whose elements are the 
result of all valid random choices within u plus their corresponding 
proposal probability n, according to the set of transition rules. 

In many cases this can be done within about 200 lines of code with a high 
amount of re-usability. Example chains have been included within the library. 
We will now give a detailed description of the algorithmic ideas behind the most 
important methods available in the library. 

Construction of the State Graph Given an instance representation (e.g. 
a degree sequence), the first step of the analysis is the construction of the state 
graph for the given instance. The purpose of this step is to gain a sparse 
representation of the state graph T = (fi, fH). Initializing f 1 with an arbitrary 
state generated by the MarkovChain: : arbitraryState method, we perform a 
full graph scan to iteratively enumerate the set of states and transition arcs. In 
each step, we select a state u £ Q and enumerate all its adjacent states N u via 
the MarkovChain: neighbours method. We add a transition arc (u,v) with 
proposal probability k(u, v ) to 4 ' for each adjacent state v £ N u and add v 
to f2, if not already included. At the end of this step, we have a complete list 
of states and an adjacency list representation of the transition arcs 4'. The 
probability of each transition (u,v) £ 4' is subsequently computed as k(u,v) ■ 
min(l, w(v)/w(u)), where the function w: —> R" 1 " can optionally be overridden 

to enable the Metropolis rule (see Algorithm [TJ. The construction of the state 
graph relies on connectedness of the state graph, which is given whenever the 
Markov chain is irreducible. The construction of the state graph is a sequential 
process, consuming 0(|fl| + |4i|) time and memory. 

Computation of the Total Mixing Time We compute the total mixing 
time r(e) of a state graph based on the following idea: Let P be the transition 
matrix of an ergodic Markov chain and let a £ fl be an initial state. One step 
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of the random walk changes the probability distribution p^ to pa +1 ^ = Pa' 1 • P. 
Iterating this argument, the distribution after t steps is given by 

P^=V^-P\ ( 8 ) 

where is a row vector whose components are pi°\a) = 1 for the initial 
state a and pi°'(*) = 0, for i ^ a. So, each row of the matrix P* gives the 
distribution to a certain pa ' for a £ f2. The total mixing time r(e) can therefore 
be computed from P by iterated matrix multiplication. The principle can be 
described as follows: We transform the adjacency list representation of V into 
its adjacency matrix representation. The entries Pij of this matrix represent 
the transition probabilities for going from state i to state j, so this matrix is the 
transition matrix described above. Since the total variation distance decreases 
monotonically with t, we can find r(e) with a two-step procedure. 

1. Starting with i = 0, we iteratively increment i until max oe n ||pa — 7r|| < 
e. This requires, in fact, just a sequence of i — (log 2 (T(e))] matrix squaring 
operations to compute the matrices P 2 , P 4 , P 8 ,..., P 2 . At the end of 
step one, we know that the total mixing time r(e) lies in the range 2 Z_1 < 
r(e) < 2\ 

2. We use binary search to find r(e). We define two variables l = 2* _1 
and r = 2* representing respectively lower and upper bounds on r(e). In a 
sequence of \og 2 (r—l) = [log 2 (T(e))~| —1 iterations, we half the search space 
in each step by computing the total variation distance max ae n Hpi™'* — 7r|| 
with m = \_(l + r)/2j from P m . Although P m could, in principle, be 
computed from P with binary exponentiation, we save additional matrix 
multiplications by reusing P l to compute P m = p l p m ~ l . Maintaining 
the invariant max ae n|ba^ — 7r|| > e > max a6 n||pi^ — 7r||, we update 
the lower bound l or the upper bound r, depending, whether the total 
variation distance is larger than e or not. We stop when l > r. The value 
of r is the total mixing time. 

Computing the total mixing time requires 0(|II| 2 ) memory and 0(log(r(e)) • 
|f2|“) time, where w is the exponent of the matrix multiplication algorithm. (In 
most libraries, oj is equal to 3. However, the matrix multiplication implemen¬ 
tation can be exchanged by using another implementation when building the 
library.) 

Computing the total mixing time is currently by far the most time and mem¬ 
ory consuming operation in our library. Due to quadratic memory requirement, 
it is applicable only for small state graphs with |H| < 30000 (depending on main 
memory). Since this method is also very time-consuming, we provide three im¬ 
plementations, which can be used if the appropriate hardware is available: 

a) A classical CPU implementation on the basis of openBLAS [Tl] , which 
runs in parallel on multi-core CPUs. This method should be used for very 
small instances, or if no CUDA-capable GPU is available. 


b) A cuBLAS 1121 based GPU implementation which can be significantly 
faster than the CPU implementation. Because of the lack of memory on 
most GPUs, this implementation is designed for small state graphs with 
|f2| < 15000 (depending on GPU memory). 

c) A CPU-GPU hybrid implementation using the cuBLASXt !I3] library 
for matrix multiplication. This implementation should be the method of 
choice if a CUDA capable GPU is available. This method is typically 
faster than the CPU implementation and has the additional advantage 
that the CPU is free to be used for other computations. 

The three implementations share the same asymptotical running time, but dif¬ 
fer greatly in practice (see Fig. [Qfor a comparison of performance). Since most 
of the work is matrix multiplication, this comparison almost directly maps to 
a performance comparison between the corresponding libraries for matrix mul¬ 
tiplication. The 0(log(r(e))) invocations of the total variation distance com¬ 
putation, each with time complexity of 0(|fl| 2 ), contribute only a little to the 
running time. 
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Figure 1. Single and double precision performance of the total 
mixing time computation. The charts show the running time for the 
computation of the total mixing time on the example of five state graphs of 
size 8012 to 20358. Due to the relatively small amount of GPU memory on our 
test system, only the first four (respectively two) state graphs could be 
processed by the GPU implementation in single precision mode (respectively 
double precision mode). The running times were measured on an Ubuntu 
14.04 system with a Intel Xeon E3-1231, NVIDIA GeForce GTX 970 (4 GB 
GPU memory) and 16 GB of main memory, using gcc in version 4.8.4 and 
CUDA in version 7.0. 


Computation of the Spectral Bound The difference between the largest 
and the second largest eigenvalue (1 — A max ) of a Markov chain’s transition ma¬ 
trix is known as the spectral gap and is of central interest in mixing time analysis. 
To compute this quantity, we use the well-known ARPACK++ JT4] library to 
compute the two real eigenvalues with the largest magnitude of the transition 
matrix P. In case of non-symmetric transition rules, we first transform the 
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transition matrix P into the symmetrical matrix P sym via Equation (fol) . Af¬ 
ter computing both eigenvalues with the largest magnitude, we use the second 
eigenvalue in order to compute the upper and lower spectral bound via Inequal¬ 
ities © and ©. Since this method requires only a sparse representation of the 
transition matrix, it is also applicable for large state graphs. 

Computing the Congestion Bound As stated in the introduction, we are 
also interested in the quality of other bounding techniques. In particular, we are 
interested in the canonical path method which is often used to gain theoretical 
bounds. We implemented a method that can be used to gain an upper bound 
on the mixing time by computing the maximal congestion of a user-definable 
path construction scheme. Four each pair of states (u, v) £ SI x fl we apply the 
path construction scheme to construct a path p from state u to state v. The 
congestion of each transition arc lying on this path is increased by |p|7r(u)7r(i;) 
(see Equation ©). Since the construction of |fi| 2 paths can be done completely 
independently of each other, we use OpenMP to construct all paths in parallel. 
The maximal congestion of any transition arc is used for computing the upper 
bound via Inequality ©. This method has a time complexity of 0(|fl| 2 ) and a 
memory complexity of 0(|T|). 

Network Analysis As state graphs can be seen as weighted directed graphs, 
all kinds of graph algorithms can be integrated into marathon. As an example, 
we added functions for computing the diameter of a state graph, as well as 
functions for computing the average path length. 

Exemplary Markov-Chains 

To demonstrate possible applications of marathon , we will describe a set of 
experiments in the following section. In our experiments, we focus on two famous 
sampling problems. The problem of uniformly sampling a perfect matching in a 
bipartite graph and of uniformly sampling a bipartite graph realization for given 
vertex degrees. Both problems are based on bipartite graphs. A bipartite graph 
G = (17, V, E) is an undirected graph with disjoint vertex sets V and U and a 
set of edges E C U xV , connecting vertices from U with V. We will refer to the 
vertices of U as iq, where i ranges from 1 to n := \U\ and to the vertices of V as 
Vj, where j ranges from 1 to n' := \V\. We analysed four different chains that 
can be used for these sampling problems. To make this article self-contained, 
we give a introduction to the chains. 

Markov Chains for Sampling of Perfect Matchings 

A perfect matching in a bipartite graph G = (U, V, E), with n = n', is a subset 
MCE of edges such that a) all edges in M are non-adjacent, and b) \M\ = n. 
Uniformly sampling a perfect matching is the problem of choosing one perfect 
matching from the set of all perfect matchings of G uniformly at random. We 
describe two important Markov chains which are known to solve the problem of 
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uniformly sampling perfect matchings. These chains origin from applications in 
statistical physics and are widely known in the field of Markov chain analysis, 
so they make good examples for our analysis. Both Markov chains use the set of 
near-perfect matchings as auxiliary states. A near-perfect matching is a subset 
of non-adjacent edges from E , but with one edge less than a perfect matching. 
So, being specific, the Markov chains used for sampling are based on the set 
f 1m of perfect and near perfect matchings in G. 

Matching Chain One Broder m introduced a Markov chain which can be 
described as follows. In state M £ VLm we pick a candidate state M' (line three 
in Algorithm [T]) by choosing an edge e = {it, u} from E uniformly at random. 
One of five cases occurs: 

1. If M is a perfect matching and e £ M, then remove e from M and select 
M' £- M \{e}. 

2. If M is a near-perfect matching and e 0 M, then add e to M and select 
M' £- MU{e}. 

3. If M is a near-perfect matching, u is not matched in M, and an edge 
e' = {«;, v} £ M exists, then remove e' from M, add e and select M' <— 
(M\{e'})U{e}. 

4. Symmetrically, if M is a near-perfect matching, v is not matched in M, 
and an edge e! = {u, z} £ M exists, then remove e! from M, add e and 
select M' £- ( M \ {e'}) U {e}. 

5. In all other cases, stay at M. 

The state M' is proposed as a candidate state with proposal probability k(M, M') 
l/|i?| in line three of Algorithm [TJ The proposed state M' is accepted or re¬ 
fused by line four of Algorithm |TJ using unit weights. So, this chain converges 
to a uniform stationary distribution. In the remainder of this article, we refer 
to this Markov chain as Matching Chain One. Jerrum and Sinclair nu proved 
that Matching Chain One is rapidly mixing for a graph class where the ratio of 
near-perfect matchings A/”(G) and perfect matchings A4 (G) in G is polynomially 
bounded. Being specific, its total mixing time can be bounded from above: 

r(e) < 16 2 |£| 2 "MM • e" 1 )- (9) 

The class of dense bipartite graphs, i.e. bipartite graphs G = (XJ , W, E) with 
|!/| = | VC | = n and with minimal vertex degree of at least n/ 2, was shown as a 
class for which the ratio of near-perfect and perfect matchings is polynomially 
bounded m ■ For this class of graphs, the following upper bound on the total 
mixing time can be gained HZj: 

r(e) £ 0(n 7 ln(|0 M | • e -1 ))- (10) 
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A main problem of Matching Chain One is the fact that it does not only generate 
perfect matchings but also near-perfect matchings. In many graphs with 2 n 
vertices the ratio of the number of near-perfect and perfect matchings cannot be 
bounded by a polynomial which is dependent on n. In such cases, the sampling 
of a perfect matching needs an exponential number of trials to see a perfect 
matching at all. 


Matching Chain Two One way to overcome the problem of an exponential 
total mixing time has been given by Jerrum, Sinclair and Vigoda |18l . Their 
main idea is to use a carefully chosen weight function w to sample from a non- 
uniform distribution. This way, they gained a rapidly mixing Markov chain 
which can be used to sample perfect matchings from any given bipartite graph. 
In addition, the transition rules were changed to the following: In a state M £ 
Qm'- 

1. If M is a perfect matching, we choose an edge e = {u, w} from M uniformly 
at random. Remove e from M and select M' «— M \ {e}. 

2. If M is a near-perfect matching where u and v are unmatched vertices, we 
choose a vertex z £ U UV uniformly at random. 

2a) If z is one of the unmatched vertices u and v and e = {it, v} £ E , 
then add e to M and select M' <— M U {e}. 

2b) If z £ V, {it, z} £ E and {x, z} £ M, then remove the edge { x , z} 
from M, add {it, z} and select M' <— ( M \ {{x, 2 }}) U {{it, z}}. 

2c) If z £ [/, {z,v} £ E and {z,y} £ M, then remove the edge {z,y} 
from M, add {z, i>} and select M' £- (M \ {{z, 1 /}}) U {{ 0 , it}}. 

The proposal probability for two states M, M’ £ fljvf is therefore 

1/n when moving between perfect and near-perfect matchings and l/(2n) when 
moving between near-perfect matchings. In connection with Algorithm [TJ this 
Markov chain converges to a stationary distribution 7r which is proportional to 
the weight function w. Defining M as the set of perfect matchings of G, and 
Af u ,v as the set of near perfect matchings in G where it and v are unmatched, 
the weight function suggested in [18] is defined as: 


w{M) 


1, M £ M 

\M\/\M u ,vl M £ Af u ,v 


( 11 ) 


Knowing the values of \A4\ and |A/" UiU |, the total mixing time of this chain can 
be bounded from above by the following polynomial m- 

r(e) £ 0(n 4 ln((7r min • e) _1 ). (12) 


Unfortunately, |Ad| and |7V" u>t ,| are usually not known in practice. Jerrum, Sin¬ 
clair and Vigoda gave a description of a procedure for approximating these 
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quantities in polynomial time |l&j . In our experiments though, we knew the 
exact values since we had a list of all states. In combination with Algorithm [Q 
this Markov chain is rapidly mixing for all bipartite graphs and can be used for 
sampling of perfect matchings. We refer to this chain as Matching Chain Two. 

Markov Chains for Sampling of Bipartite Graph Realiza¬ 
tions 

The bipartite graph realization problem is to find, for a pair of non-increasing 
integer sequences (a±,... ,a n ), (bi,... ,b n >) with at,bi > 0, a bipartite graph 
G = ([/, V, E), without loops and parallel edges, such that the vertex degrees of 
Ui £ U matches the a, and that the vertex degrees of Vi £ V match the 6j. Such 
a graph G is then called bipartite graph realization or, shorter still, realization. 
The set of all realizations of a given degree sequence pair makes the sampling 
set flu for this sampling problem. The bipartite graph realization problem is so 
important in many fields that it was reinvented several times. It is also known as 
the problem of contingency tables with given marginal sums or, as matrices with 
fixed row and column sums. As a direct application of random bipartite graphs, 
scientists often use a sampled realization as a null model to prove statistical 
hypotheses. Since sampling of random bipartite graphs is important in a wide 
range of fields, we chose it as a second example for our experiments. 

In contemporary literature, one can find two possible Markov chains that 
solve the problem of uniform sampling of a bipartite graph realization. One of 
these approaches is simply to transform it into the perfect matching sampling 
problem [20]. This way, it is known that the running time of the sampling 
procedure is bounded polynomially. However, the very simple and intuitive 
switch chain , presented by Kannan et al. EH, is widely used in practice. 

Switch Chain One At a state G = ( U,V,E ) G £Ir, choose four integers 
i,j,k,l with 1 < i < k < n and 1 < j < l < n' uniformly at random. One of 
three cases occur: 

1. If the edges ei = {ui,Vj} and ei = {uk,vi} exist in E but the edges 
e[ = { Ui,vi } and e' 2 = {uk,Vj} do not, then select G' = (V,U,E') with 
E'<— (E\ {ei, e 2 }) U {ei, e' 2 }. 

2. Symmetrically, if ei = { Ui,Vj } and e 2 = {uk,vi} both do not exist in E 
but the edges e\ = {ui,vi} and e 2 = {uk,Vj} exist, select G' = (V,U,E') 
with E' <r- (E\ {ei, e 2 }) U {ei, e 2 }. 

3. In all other cases, select G’ •<— G. 

Such a switch , i.e. changing the endpoints of two edges, does not change the 
degree sequence and thus constructs a new bipartite graph realization of S. It 
turns out, all graph realizations can be generated by starting with a given graph 
realization and applying a number of switches [2j2]. The proposal probability 
k(G, G') of a non-loop transition is n ^ +1 ^ • n q T p +1 ) • Using unit weights w(x) =1 


13 




for all x £ f Iji Algorithm [T| converges to a uniform stationary distribution. 
Kannan et al. m proved polynomially bounded mixing times of this chain for 
regular sequence pairs. Miklos et al. [23] extended the proof to half-regular 
sequence pairs. For all other sequence classes, the problem of the rapid mixing 
property is still open. We will refer to this Markov chain as Switch Chain One. 
Recently, Greenhill gave a proof that the total mixing time of the directed, non- 
bipartite version of Switch Chain One is bounded by the following polynomial 
when the largest degree d max of the input sequence lies in the range 3 < d max < 
j^/\~E\, where \E\ is the number of edges in the realization [23]: 

r(e) < ^CJ£| 9 (|£| In \E\ + He- 1 )). (13) 

Switch Chain Two A modification of Switch Chain One, given by Berger et 
al. [25] for directed graph sequences, can also be used for the bipartite version. 
The main idea is to reduce the number of transition loops in the state graph. To 
avoid non-convergence, an additional additional artificial edge {mo,uo}, w hich 
connects two additional vertices Uq and i>o, is added to the set of edges. In a 
state G = (U,VH) G select a pair of non-adjacent edges e\ = {iii,Vj} and 
e 2 = {ufc, vi} from the set E U {{wo, A)}}- 

1. If, ei = {uo,'Co} or e 2 = {mo,^}) select G' <— G. 

2. If e[ = {ui,vi} 0 E and e' 2 = {uk,Vj} 0 E , then switch the edges 
(ei, e' 1} e 2 , e 2 ) and select G' = (V, U, E') with E' t— (E\ {ei, e 2 })U{e' 1 , e' 2 j. 

3. In all other cases, select G' <— G. 

Without the additional edge, a problem would occur if no adjacent edge pairs 
existed for each graph realization of a sequence; then, the state graph would 
not possess loops. Similar arguments like those of Ref. 25] show that this 
chain converges to the uniform distribution. This chain reduces the average 
probability of transition loops and, as it will turn out, the average total mixing 
time. However, the asymptotical total mixing time is the same as for Switch 
Chain One. We will refer to this chain as Switch Chain Two. 


Experiments 

We demonstrate our library to experimentally analyse the mixing time be¬ 
haviour of the Markov chains. The goal of our experiments is to quantify the 
gap between the total mixing time and the various bounding methods. We are 
asking the question, whether the bounding techniques are tight or if there is 
a huge gap between total mixing time and its bounds. The latter case would 
support the thesis that the total mixing time actually is far smaller than known 
in theory. This would be a welcomed message for any programmer who wants to 
implement a sampling method and does not want to choose a highly polynomial 
(or even exponential) number of steps. We want to make some general remarks 
before beginning to describe our experiments. 
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The choice of e Recall, that the total mixing time r(e), as well as its bounds 
depends on the value of e which defines the distance to the stationary distribu¬ 
tion at the end of the random walk. For the purpose of our experiments, we 
choose e = 10 -3 . Furthermore, for other values of e we get similar results. 

Applying canonical paths To compute the congestion bounds, we use the 
path construction schemes from the original proofs and apply them to concrete 
state graphs. As those path construction schemes were used to gain theoretical 
bounds without full knowledge of a Markov chain’s structure, applying them to 
concrete state graphs shows how sharp such bounds could be if full structural 
information were available. Consequently, the bounds in general cannot be 
better than the upper bounds that are gained when applying the scheme on 
an actual state graph. To understand, how sharp the theoretical bounds can 
be, we analyse the differences to the actual total mixing times. In the case of 
matching chains, we use the construction scheme presented in m which was 
used to gain the upper bound given in Equation [TO] In case of the switch chains 
we apply the definition of the canonical paths from m- For the definition of the 
path construction schemes, we refer to the original articles. We tried as close as 
possible to implement the original canonical path construction rules. However, 
the definition often relies on an arbitrary order of the vertices, or orderings of 
cycles. It turned out that changing these orders has an effect on the quality of 
the congestion bound. However, as the proofs for the upper bounds are valid 
for all orderings, this is not a problem for our observations. 

Hardware Usage Since we wanted to compute the properties of a large 
amount of instances, we decided to compute as many properties of a state 
graph as possible in parallel. As shown before, the computation of the total 
mixing time is the bottleneck for our computations. For this reason, we decided 
to transfer this task to the GPU whenever possible. This has the advantage, 
that the free CPU cores can be used for the computation of other quantities, 
while the GPU is occupied. With this strategy, we achieve a large throughput 
of instances. Fig. [2] shows the hardware usage while computing the properties 
of a Markov chain instance. 

Enumeration Experiment 

In our first experiment, we wanted to gain an impression concerning the dimen¬ 
sion of the total mixing time and its bounds. 

Experimental Setup Firstly, we enumerated all small input instances up to a 
given limit for each chain. In particular, we enumerated 19,378 bipartite degree 
sequence pairs with up to 6+6 vertices as input for the switch chains and 89,242 
connected non-isomorphic bipartite graphs with 6+6 vertices as input for the 
matching chains. While the first task was done with a simple backtracking proce¬ 
dure, the latter task was done by the command line tool nauty [26j . These com- 
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Spectral Bound] 
Congestion Bound 
Total Mixing Time I 


Figure 2. Typical hardware usage for our experiments. At first, the 
state graph is being constructed as a sequential task. Afterwards, the 
properties of the state graph are being computed in parallel. When each 
property is computed the next instance is processed in the same manner. 


binatorial objects served as input instances for the Markov chains. Even though 
the size of input is small, the resulting state graphs can be large. For exam¬ 
ple, the largest state graph processed in this experiment had 297,200 states and 
corresponds to the regular degree sequence pair (3,3, 3,3, 3,3),(3,3, 3,3, 3,3). 
Since both the number of instances, as well as the size of the state graphs grow 
exponentially, we were not able to perform such an experiment for a much larger 
instance size. For each instance, we constructed the corresponding state graph. 
For all state graphs with a maximum of 20,000 states, we computed the fol¬ 
lowing properties: a) total mixing time, b) spectral bound, and c) canonical 
path congestion bound. Although we could easily compute the latter bounds 
for much larger state graphs, we did not do so, since we need the total mixing 
time for comparison. 

Results In Fig. 0 we show the results of the enumeration experiment. Each 
input instance corresponds to two data points, showing the ratio of the upper 
spectral bound (green), and congestion bound (blue) with the total mixing time. 
We immediately observe that the spectral bound is very close to the total mixing 
time for all instances, in contrast to the congestion bound. More precisely, in 
case of Matching Chain Two, the congestion bound is up to 800 times larger than 
the total mixing time. Similar observations can be made for the other chains. 
In Fig. U we highlight the instances which are known to have a polynomial 
congestion bound. In the case of the switch chains, these are regular and half¬ 
regular degree sequences. In the case of Matching Chain One, these are dense 
bipartite graphs. Matching Chain Two is known to be rapidly mixing for all 
instances. We find that the observations can also be confirmed within this set. 

Scaling Experiment 

We wanted to know whether the observed effects remain valid for larger in¬ 
stances. The main problem is, that the number of instances for our Markov 
chains, both degree sequences and bipartite graphs, grows exponentially with 
input size. So, a complete enumeration becomes infeasible when the instance 
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Figure 3. The quality of the upper bounds. The quotient of congestion 
bound (blue) and total mixing time, respectively upper spectral bound (green) 
and total mixing time versus the size of the corresponding state graph. 


size grows. To gain some insights, we picked selected instances for the switch 
chains and scaled them to larger size. 

Experimental Setup We picked a half-regular sequence pattern (n — l,n — 
2, 2,1), (2, 2,..., 2), where the number of two’s in the second sequence is n. The 
parameter n can be used to scale the instance to different sizes. In doing so, a 
bipartite graph which realizes this degree sequences has n + 4 vertices and 2 n 
edges. These sequence pairs are half-regular and so, it is known that the switch 
chains are rapidly mixing. For each n between 4 and 50, we constructed the 
corresponding state graph and computed the total mixing time and its bounds. 
Again, due to the large memory consumption, we stopped computing the total 
mixing time when |fi| exceeded 20,000. To predict missing values of total mixing 
time for larger state graphs up to about 600,000 states, we used a linear model 
based on the following observation. 

Results We observed that the lower spectral bound seems to have a linear 
relationship with the total mixing time. This allowed us to predict the missing 
total mixing time values using a simple linear model (see Fig. [5] for details). To 
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Figure 4. The quality of the upper bounds for rapidly mixing 
instances. The results of Fig. [3] filtered to highlight instances with known 
polynomial mixing time. Instances with no known polynomial bound are 
coloured gray. 


quantify the gap between the bounds and total mixing time, we again divided 
each bound through the corresponding total mixing time or its predicted value. 
Fig.|6]shows the result of this experiment. We observed that the gap between the 
congestion bound and the total mixing time grows with input size n. Further¬ 
more, we observed that the same is true for the upper spectral bound, but in a 
much slower way. We repeated this experiment with other half-regular sequence 
patterns and observed similar effects. In particular, we changed our sequence 
pattern (n — 1, n — 2, 2,1), (2, 2,..., 2), which we call type A, slightly to the se¬ 
quences patterns (n — 1, n — 2,3), (2, 2,..., 2) (type B) and (n — 1, n — 2,1,1,1), 
(2, 2,..., 2) (type C). Fig[7]shows the, partially estimated, total mixing time for 
each sequence type. We observed that each type has its own growing asymp¬ 
totic. The data suggests, in the case of Switch Chain One, the total mixing 
time of sequence type A grows with 0(?r 2 2 '), of type B with 0(n 116 ), and of 
type C with 0(n 2A3 ). In the case of Switch Chain Two, we observe similar 
effects. These results are interesting because these bounds are lower bounds on 
the mixing time of the switch chains for all half-regular sequence pairs. Since 
two of them are clearly super-linear, a random walk with a linear number of 
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Figure 5. Relationship between the lower spectral bound and the 


total mixing time. The total mixing time is shown in connection to a 
corresponding lower spectral bound for sequence pairs of the form 
(n — 1, n — 2,2,1), (2, 2,..., 2). We use the displayed formulas to predict 
missing values for total mixing time. 
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Figure 6. Quality of the upper bounds with growing instance size. 

The quotient of congestion bound (blue), respectively upper spectral bound 
(green) with total mixing time is shown as a function of the instance size. A 
dotted line shows the point where we stop computing the total mixing time 
and start using the lower spectral bound approximation. 


steps must ultimately fail and will result in a non-random sample. 

Loop Reduction Experiment 

From the data of the enumeration experiment, we observed that the total mixing 
time does not correlate with the size of the state graph (see Fig. [8]). Instead, 
the largest mixing time appears at input instances corresponding to small state 
graphs. For example, the sequence pair (6, 6,6, 6, 5, 5),(6, 6,6, 6, 5, 5), possesses 
just two different states but is beneath the instances with largest total mixing 
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Figure 7. Growing asymptotic for the total mixing time of 
half-regular sequence pairs. The observed mixing time curves for the 
degree sequence pairs of type A are coloured violet, of type B red, and of type 
C orange. 
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Figure 8. Number of states versus total mixing time. Each data point 
represents one input instance from our set of input instance. Instances with 
known polynomial mixing time are highlighted red. 
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time in both switch chains. To explain this effect, consider sequence pairs of 
the general form (n, n,... ,n — 1 ,n — 1), (n, n ,..., n — 1 , n — 1 ), where 2 n is 
the number of vertices in the corresponding bipartite graph. It is apparent that 
such sequence pairs posses two different realizations, meaning, their state graph 
have two states. Now consider Switch Chain One; for a given n, the number 
of ways to choose i, j, k, l is (n ■ (n + l)/2) 2 £ 0(n A ). Due to the large number 
of edges in the sequence pairs, all but one of these choices result in loops. As 
a consequence, the expected number of steps, just to get from one state to the 
other, is 0(n 4 ), which is immediately a lower bound for the mixing time of 
Switch Chain One. The fact that an instance with polynomially bounded total 
mixing time makes the largest mixing time over all instances is likely an effect 
of the small input size. To better understand this effect, we decided to further 
reduce the influence of the loops. 

Experimental Setup We repeated the enumeration experiment but made 
some artificial modifications on each state graph before computing its properties. 
For each state graph we determined the minimal loop probability over all states 
P m in := mm.igfi.Pii. We wanted to reduce the transition probability of each 
loop in the state graph by this quantity. However, to avoid the removal of all 
loops and possible problems with convergence, we removed only 99% of this 
probability from the loops. This way, we gained new transition probabilities of 


P/i £- Pa - .99P min for all i £ fi. 


(14) 


The amount of .99P m ; n which was reduced from the probability mass of each 
state, was restored to the network through rescaling of the remaining transition 
arcs: 



(15) 


for all i,j £ Cl. 


By subtracting the same amount of probability from each loop, we kept the 
symmetry of the transition matrix so that the chain’s stationary distribution 
was still uniform. The result was a state graph with the same structure but 
drastically reduced loop probability. In such state graphs, the mixing times are 
much more dependent on the structure of the state graph than on the number 
of the loops. We repeated the computation of the total mixing time and its 
upper bounds with those modified state graphs for Switch Chain One. 

Results We first observed that the modification indeed had a strong effect 
on the total mixing time (see Fig. [9]). While the largest total mixing time in 
our instance set is about 1400, in the set of reduced state graphs, it is 63. In 
addition, we observed that the total mixing time after the reduction steps no 
longer correlates with the average loop probability. So, the effects we observe 
now are more or less detached from the influence of the loops. We observed, that 
the effect we observed in the enumeration experiment still remain (see Fig llOl) . 
The quality of the congestion bound is still bad while the largest total mixing 
time occurs at relatively small state graphs. We finished our experiments with 
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Figure 9. Influence of the average loop probability on the total 
mixing time. Each data point represents a state graph, for each input 
instance of Switch Chain One, before and after loop reduction. 
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Figure 10. Properties of reduced state graphs. (A) Size of a state graph 
versus its total mixing time. Regular and half-regular sequence pairs are 
highlighted red. (B) Quotient of congestion bound (blue) and upper spectral 
bound (green) with corresponding total mixing time. 


the observation that the upper spectral bound as well as the congestion bound 
heavily depend on the average vertex degree of the state graph (see Fig. ED- 
We concluded that the average degree is a structural property of a state graph 
which has a stronger influence on the total mixing time than its size. 


Results and Discussion 

We introduced the marathon library and demonstrated how it can be used to 
support the analysis of sampling algorithms by a set of experiments. We briefly 
summarise the most important observations from our experiments: 

• The congestion bound is much larger than the total mixing time in almost 
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Figure 11. Influence of the average vertex degree. Connection between 
average vertex degree of a state graph and its total mixing time, respectively 
canonical path bound. 


all cases. The quality of the congestion bounds deteriorates with growing 
input size. This observation is made in reduced and non-reduced state 
graphs. 

• The upper spectral bound keeps close to total mixing time even when the 
input size grows, although it can be observed, that its quality drops slowly. 
Again, this observation is made in reduced and non-reduced state graphs. 

• The lower spectral bound seems to have an almost linear relationship with 
the total mixing time. This way, it can be used to precisely predict the 
total mixing time. The search for an explanation of this effect is likely a 
good subject for further theoretical studies. 

• Maybe the most surprising observation made in our experiments is that 
large total mixing time tends to occur at small state graphs. 

• The average vertex degree of a state graph has a strong relationship on 
its total mixing time. This observation should be investigated further. 
Figuring out which sequences have a tendency towards more diverse ver¬ 
tex degrees in their state graphs could lead to further interesting rapidly 
mixing graph classes. 

Our observations indicate that the theoretical bound gained by the canonical 
path method is likely too pessimistic. Moreover, although we know the exact 
structure of a state graph in our experiments, which can never be the case in a 
normal practical scenario, the congestion bound is too large. This leads us to 
the conclusion that the canonical path method will never yield applicable values. 
Furthermore, this gives hope to the hypothesis, that the true total mixing time 
is smaller than existing theory is able to prove. So, we are optimistic about the 
applicability of the switch chains. We think that a promising next step could be 
to focus on two fields which need one another: investigating and the proving of 
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special structures for state graphs depending on the problem and figuring out 
new relationships between structures of graphs and its eigenvalues. 

Future Work 

One of our next steps is to use our new tool very extensively. We want to 
investigate further chains and test further hypotheses. Moreover, we plan to 
add further functionality to the marathon library to enable additional research 
questions. For example, we want to compute the distribution of the mixing time. 
While the total mixing time displays the maximal number of steps necessary 
to drop below a total variation distance of e, the mixing time as a function of 
the starting state would lead to further insights, such as average mixing time. 
Another additional feature would be the computation of a multi commodity 
flow congestion scheme, which is a generalization of the canonical path method. 
Furthermore, we want to improve the performance of existing methods. In 
particular, we want to add multi GPU support as well as a parallel method 
for the construction of the state graph. Another topic would be the parallel 
evaluation of the canonical path congestion method using GPUs. 
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